NCBI/GenBank BLAST Output XML Parser Tool
نویسندگان
چکیده
We describe a small freely available computer script to extract ‘real world’ sequence descriptions from the BLASTX results from sequences generated by the stand-alone ncbiblast2.2.26 suite of tools (available from NCBI/GenBank). Our Python (2.7) script is intended to make name extraction feasible for thousands, of hundreds of thousands, of sequences such as that generated by BLASTX analysis of RNA-Seq (transcriptome) obtained cDNAs from next generation sequencing (NGS) experiments. This script facilitates the interrogation of the large BLASTX output of a transcriptome experiment by familiar tools such as Microsoft Excel, or LibreOffice Calc. The script was written and tested on the Linux operating system (Ubuntu 12.04 LTS), but should work in any Python 2.7 compatible environment. We include some example files and help documentation.
منابع مشابه
Zerg: A Very Fast BLAST Parser Library
SUMMARY Zerg is a library of sub-routines that parses the output from all NCBI BLAST programs (Blastn, Blastp, Blastx, Tblastn and Tblastx) and returns the attributes of a BLAST report to the user. It is optimized for speed, being especially useful for large-scale genomic analysis. Benchmark tests show that Zerg is over two orders of magnitude faster than some widely used BLAST parsers. AVAIL...
متن کاملDatabase resources of the National Center for Biotechnology Information
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data available through NCBI's web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, B...
متن کاملatabase resources of the National Center for Biotechnology Information: update
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's website. NCBI resources include Entrez, PubMed, PubMed Central, LocusLink, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic...
متن کاملDatabase resources of the National Center for Biotechnology
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, PubMed, PubMed Central (PMC), LocusLink, the NCBITaxonomy Browser, BLAST, BLAST Link (BLink), Elec...
متن کاملNOBLAST and JAMBLAST: New Options for BLAST and a Java Application Manager for BLAST results
UNLABELLED NOBLAST (New Options for BLAST) is an open source program that provides a new user-friendly tabular output format for various NCBI BLAST programs (Blastn, Blastp, Blastx, Tblastn, Tblastx, Mega BLAST and Psi BLAST) without any use of a parser and provides E-value correction in case of use of segmented BLAST database. JAMBLAST using the NOBLAST output allows the user to manage, view a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013